# Initialize Notebook
%run ../library/v1.1.7/init.ipy
HTML('''<script> code_show=true; function code_toggle() { if (code_show){ $('div.input').hide(); } else { $('div.input').show(); } code_show = !code_show } $( document ).ready(code_toggle); </script> <form action="javascript:code_toggle()"><input type="submit" value="Toggle Code"></form>''')
Acute Coronary Syndrome Drug Discovery Analysis¶
Introduction¶
This notebook contains an analyis of a user-submitted RNA-seq dataset. created using BioJupies. For more information on BioJupies, please visit http://biojupies.cloud. If the notebook is not correctly displayed on your browser, please visit our Notebook Troubleshooting Guide.
Table of Contents¶
The notebook is divided into the following sections:
- Load Dataset - Loads and previews the input dataset in the notebook environment.
- PCA - Linear dimensionality reduction technique to visualize similarity between samples
- Clustergrammer - Interactive hierarchical clustering heatmap visualization
- Library Size Analysis - Analysis of readcount distribution for the samples within the dataset
- Differential Expression Table - Differential expression analysis between two groups of samples
- Volcano Plot - Plot the logFC and logP values resulting from a differential expression analysis
- MA Plot - Plot the logFC and average expression values resulting from a differential expression analysis
- Enrichr Links - Links to enrichment analysis results of the differentially expressed genes via Enrichr
- Gene Ontology Enrichment Analysis - Identifies Gene Ontology terms which are enriched in the differentially expressed genes
- Pathway Enrichment Analysis - Identifies biological pathways which are enriched in the differentially expressed genes
- Transcription Factor Enrichment Analysis - Identifies transcription factors whose targets are enriched in the differentially expressed genes
- Kinase Enrichment Analysis - Identifies protein kinases whose substrates are enriched in the differentially expressed genes
- L1000CDS2 Query - Identifies small molecules which mimic or reverse a given differential gene expression signature
- L1000FWD Query - Projects signatures on a 2-dimensional visualization of the L1000 signature database
# Load dataset
dataset = load_dataset(source='upload', uid='ET9cTREfrNC')
# Preview expression data
preview_data(dataset)
| Ctrl1 | Ctrl2 | Ctrl3 | ACS1 | ACS2 | ACS3 | SA1 | SA2 | SA3 | |
|---|---|---|---|---|---|---|---|---|---|
| DPM1 | 7.255649 | 7.375353 | 3.367295 | 12.250352 | 7.004579 | 9.096304 | 5.764230 | 7.838643 | 8.848128 |
| SCYL3 | 1.567150 | 1.418937 | 1.364962 | 2.035762 | 1.611222 | 1.720633 | 2.029081 | 1.099374 | 1.578072 |
| C1orf112 | 1.240987 | 0.890164 | 0.971860 | 1.229537 | 0.787981 | 0.724214 | 0.964807 | 0.747368 | 1.193794 |
| FGR | 67.533774 | 49.635314 | 52.748331 | 82.210413 | 36.501235 | 74.861904 | 65.817821 | 56.638353 | 41.740712 |
| FUCA2 | 1.857609 | 2.031012 | 0.711422 | 2.347128 | 3.907391 | 1.996383 | 2.331961 | 1.555359 | 1.147055 |
Table 1 | RNA-seq expression data. The table displays the first 5 rows of the quantified RNA-seq expression dataset. Rows represent genes, columns represent samples, and values show the number of mapped reads.
# Display metadata
display_metadata(dataset)
| Condition | |
|---|---|
| Sample | |
| Ctrl1 | Healthy |
| Ctrl2 | Healthy |
| Ctrl3 | Healthy |
| ACS1 | ACS |
| ACS2 | ACS |
| ACS3 | ACS |
| SA1 | Stable_Angina |
| SA2 | Stable_Angina |
| SA3 | Stable_Angina |
Table 2 | Sample metadata. The table displays the metadata associated with the samples in the RNA-seq dataset. Rows represent RNA-seq samples, columns represent metadata categories.
# Configure signatures
dataset['signature_metadata'] = {
'Control vs Perturbation': {
'A': ['Ctrl1', 'Ctrl2', 'Ctrl3'],
'B': ['ACS1', 'ACS2', 'ACS3']
}
}
# Generate signatures
for label, groups in dataset['signature_metadata'].items():
signatures[label] = generate_signature(group_A=groups['A'], group_B=groups['B'], method='limma', dataset=dataset)
2. PCA¶
Principal Component Analysis (PCA) is a statistical technique used to identify global patterns in high-dimensional datasets. It is commonly used to explore the similarity of biological samples in RNA-seq datasets. To achieve this, gene expression values are transformed into Principal Components (PCs), a set of linearly uncorrelated features which represent the most relevant sources of variance in the data, and subsequently visualized using a scatter plot.
# Run analysis
results['pca'] = analyze(dataset=dataset, tool='pca', nr_genes=500, normalization='quantile', z_score=True, plot_type='interactive')
# Display results
plot(results['pca'])
** Figure 1 | Principal Component Analysis results. ** The figure displays an interactive, three-dimensional scatter plot of the first three Principal Components (PCs) of the data. Each point represents an RNA-seq sample. Samples with similar gene expression profiles are closer in the three-dimensional space. If provided, sample groups are indicated using different colors, allowing for easier interpretation of the results. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide
3. Clustergrammer¶
Clustergrammer is a web-based tool for visualizing and analyzing high-dimensional data as interactive and hierarchically clustered heatmaps. It is commonly used to explore the similarity between samples in an RNA-seq dataset. In addition to identifying clusters of samples, it also allows to identify the genes which contribute to the clustering.
# Run analysis
results['clustergrammer'] = analyze(dataset=dataset, tool='clustergrammer', nr_genes=500, normalization='quantile', z_score=True)
# Display results
plot(results['clustergrammer'])
** Figure 2 | Clustergrammer analysis. **The figure contains an interactive heatmap displaying gene expression for each sample in the RNA-seq dataset. Every row of the heatmap represents a gene, every column represents a sample, and every cell displays normalized gene expression values. The heatmap additionally features color bars beside each column which represent prior knowledge of each sample, such as the tissue of origin or experimental treatment.
4. Library Size Analysis¶
In order to quantify gene expression in an RNA-seq dataset, reads generated from the sequencing step are mapped to a reference genome and subsequently aggregated into numeric gene counts. Due to experimental variations and random technical noise, samples in an RNA-seq datasets often have variable amounts of the total RNA. Library size analysis calculates and displays the total number of reads mapped for each sample in the RNA-seq dataset, facilitating the identification of outlying samples and the assessment of the overall quality of the data.
# Run analysis
results['library_size_analysis'] = analyze(dataset=dataset, tool='library_size_analysis', plot_type='interactive')
# Display results
plot(results['library_size_analysis'])
** Figure 3 | Library Size Analysis results. **The figure contains an interactive bar chart which displays the total number of reads mapped to each RNA-seq sample in the dataset. Additional information for each sample is available by hovering over the bars. If provided, sample groups are indicated using different colors, thus allowing for easier interpretation of the results. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.
5. Differential Expression Table¶
Gene expression signatures are alterations in the patterns of gene expression that occur as a result of cellular perturbations such as drug treatments, gene knock-downs or diseases. They can be quantified using differential gene expression (DGE) methods, which compare gene expression between two groups of samples to identify genes whose expression is significantly altered in the perturbation. The signature table is used to interactively display the results of such analyses.
# Initialize results
results['signature_table'] = {}
# Loop through signatures
for label, signature in signatures.items():
# Run analysis
results['signature_table'][label] = analyze(signature=signature, tool='signature_table', signature_label=label)
# Display results
plot(results['signature_table'][label])
| logFC | AveExpr | P-value | FDR | |
|---|---|---|---|---|
| Gene | ||||
| WDR74 | -7.39 | 5.21 | 0.000097 | 0.179365 |
| SNHG3 | -0.96 | 7.77 | 0.001291 | 0.995675 |
| RHOB | -0.82 | 6.25 | 0.005625 | 0.995675 |
| HIST1H4C | -0.74 | 6.43 | 0.007307 | 0.995675 |
| TUBB2A | -3.09 | 5.32 | 0.008873 | 0.995675 |
| PDZK1IP1 | -1.14 | 7.17 | 0.009075 | 0.995675 |
| RAB7A | 0.60 | 5.71 | 0.011606 | 0.995675 |
| HIST1H2AD | -1.38 | 4.30 | 0.011814 | 0.995675 |
| HIST1H3A | -0.54 | 4.47 | 0.012083 | 0.995675 |
| VIM | 0.62 | 6.80 | 0.012190 | 0.995675 |
| ANXA5 | 0.59 | 4.32 | 0.012207 | 0.995675 |
| TET2 | 0.82 | 4.46 | 0.012900 | 0.995675 |
| GZMH | 0.82 | 5.46 | 0.013602 | 0.995675 |
| CX3CR1 | 0.55 | 5.62 | 0.015798 | 0.995675 |
| HIST2H3D | -0.61 | 5.18 | 0.017067 | 0.995675 |
| CMTM2 | -0.68 | 5.36 | 0.017806 | 0.995675 |
| PRELID1 | -0.50 | 5.21 | 0.018348 | 0.995675 |
| HIST1H4A | -0.76 | 4.38 | 0.018568 | 0.995675 |
| TNFAIP2 | -0.54 | 6.08 | 0.019652 | 0.995675 |
| LILRB3 | -0.52 | 4.89 | 0.019689 | 0.995675 |
| SLC38A5 | -1.02 | 5.73 | 0.020209 | 0.995675 |
| RN7SL1 | -0.71 | 17.53 | 0.020740 | 0.995675 |
| FCHO2 | -0.50 | 5.31 | 0.022525 | 0.995675 |
| RPS3A | -0.62 | 6.79 | 0.023122 | 0.995675 |
| CTA-363E6.6 | -0.69 | 7.38 | 0.026010 | 0.995675 |
| RASSF2 | -0.82 | 6.05 | 0.027415 | 0.995675 |
| VCAN | 0.61 | 4.53 | 0.029072 | 0.995675 |
| TMEM56 | -0.67 | 5.28 | 0.029711 | 0.995675 |
| IL10RA | 0.53 | 4.45 | 0.029950 | 0.995675 |
| ACTR2 | 0.70 | 6.19 | 0.030598 | 0.995675 |
| PIM3 | -0.56 | 5.85 | 0.032119 | 0.995675 |
| RPL41 | -0.58 | 8.69 | 0.033414 | 0.995675 |
| NSUN3 | -0.45 | 4.73 | 0.034350 | 0.995675 |
| IER2 | -0.91 | 5.97 | 0.034686 | 0.995675 |
| AL627309.1 | 0.62 | 6.68 | 0.035337 | 0.995675 |
| FFAR2 | -0.58 | 6.08 | 0.035393 | 0.995675 |
| ZFP36 | -0.84 | 7.86 | 0.035478 | 0.995675 |
| PF4V1 | 0.99 | 6.52 | 0.036937 | 0.995675 |
| NATD1 | -0.76 | 5.24 | 0.037278 | 0.995675 |
| HIST1H3F | -0.61 | 4.49 | 0.037400 | 0.995675 |
| GZMB | 0.48 | 4.61 | 0.037598 | 0.995675 |
| RNF182 | -2.81 | 5.01 | 0.038297 | 0.995675 |
| HIST1H4B | -0.71 | 6.17 | 0.038620 | 0.995675 |
| HK1 | 0.53 | 5.04 | 0.040146 | 0.995675 |
| CKLF | -0.53 | 5.08 | 0.042031 | 0.995675 |
| GUK1 | -0.71 | 6.13 | 0.042417 | 0.995675 |
| HIST1H2AE | -0.59 | 6.32 | 0.045203 | 0.995675 |
| RHBDD1 | -0.51 | 4.44 | 0.045225 | 0.995675 |
| MFSD14B | 0.53 | 4.63 | 0.046181 | 0.995675 |
| SAMD9L | -0.46 | 4.38 | 0.046215 | 0.995675 |
| TPM3 | 0.39 | 4.56 | 0.046376 | 0.995675 |
| SCARNA2 | -0.88 | 9.66 | 0.046846 | 0.995675 |
| STOM | 0.50 | 6.38 | 0.047037 | 0.995675 |
| B2M | 0.79 | 7.65 | 0.047087 | 0.995675 |
| NKG7 | 0.61 | 6.93 | 0.047228 | 0.995675 |
| PTMS | -0.41 | 4.74 | 0.050363 | 0.995675 |
| DEFA1B | -1.12 | 5.38 | 0.051087 | 0.995675 |
| PRKCB | 0.37 | 5.12 | 0.051319 | 0.995675 |
| DEFA1 | -1.13 | 5.40 | 0.051532 | 0.995675 |
| SF3B5 | -0.48 | 4.83 | 0.053363 | 0.995675 |
| IGF2BP2 | 0.38 | 5.26 | 0.053723 | 0.995675 |
| C20orf24 | -0.61 | 5.21 | 0.055030 | 0.995675 |
| RPPH1 | -0.51 | 13.65 | 0.055855 | 0.995675 |
| HIST1H4J | -0.52 | 7.38 | 0.057366 | 0.995675 |
| TERC | -0.80 | 7.00 | 0.058360 | 0.995675 |
| UTRN | 0.40 | 4.84 | 0.058984 | 0.995675 |
| IFITM1 | -0.55 | 7.55 | 0.059893 | 0.995675 |
| SMOX | -0.54 | 4.69 | 0.059907 | 0.995675 |
| HIST1H2AG | -0.50 | 4.32 | 0.060175 | 0.995675 |
| FOS | -0.99 | 6.94 | 0.060585 | 0.995675 |
| YWHAQ | 0.46 | 4.62 | 0.061978 | 0.995675 |
| HBM | -0.48 | 7.46 | 0.063238 | 0.995675 |
| MAN1A1 | 0.46 | 5.00 | 0.063314 | 0.995675 |
| HNRNPUL1 | 0.39 | 5.07 | 0.064120 | 0.995675 |
| IKZF1 | 0.35 | 5.18 | 0.064376 | 0.995675 |
| EIF5A | -0.51 | 5.05 | 0.064523 | 0.995675 |
| GOLPH3 | 0.40 | 4.41 | 0.065036 | 0.995675 |
| B3GNT8 | -0.53 | 4.49 | 0.065898 | 0.995675 |
| TUBA4A | -0.43 | 4.68 | 0.067932 | 0.995675 |
| HBQ1 | -0.46 | 7.04 | 0.068039 | 0.995675 |
| TGOLN2 | 0.33 | 5.25 | 0.069226 | 0.995675 |
| PRPF6 | 0.41 | 4.32 | 0.072203 | 0.995675 |
| RALB | -0.66 | 4.60 | 0.072253 | 0.995675 |
| HIST1H2BG | -0.49 | 5.18 | 0.073367 | 0.995675 |
| PRF1 | 0.53 | 6.53 | 0.073538 | 0.995675 |
| SCARNA10 | -0.59 | 11.67 | 0.074411 | 0.995675 |
| RNF5 | -0.50 | 4.57 | 0.074519 | 0.995675 |
| NUSAP1 | -0.56 | 4.58 | 0.075045 | 0.995675 |
| GID8 | 0.49 | 4.50 | 0.075104 | 0.995675 |
| HIST1H4E | -0.50 | 6.46 | 0.075666 | 0.995675 |
| TBX21 | 0.51 | 4.54 | 0.075714 | 0.995675 |
| TBCEL | -0.65 | 5.25 | 0.076566 | 0.995675 |
| TUG1 | 0.54 | 4.04 | 0.077688 | 0.995675 |
| FCER1G | -0.57 | 7.12 | 0.077887 | 0.995675 |
| SAMD9 | -0.34 | 4.97 | 0.079422 | 0.995675 |
| RPS26 | -1.33 | 4.84 | 0.080532 | 0.995675 |
| LYL1 | -0.59 | 7.01 | 0.081213 | 0.995675 |
| C9orf78 | 0.47 | 6.45 | 0.082994 | 0.995675 |
| ARF1 | 0.40 | 6.76 | 0.083932 | 0.995675 |
| SKAP2 | -0.51 | 4.61 | 0.084574 | 0.995675 |
** Table 3 | Differential Expression Table.** The figure displays a browsable table containing the gene expression signature generated from a differential gene expression analysis. Every row of the table represents a gene; the columns display the estimated measures of differential expression. Links to external resources containing additional information for each gene are also provided
6. Volcano Plot¶
Volcano plots are a type of scatter plot commonly used to display the results of a differential gene expression analysis. They can be used to quickly identify genes whose expression is significantly altered in a perturbation, and to assess the global similarity of gene expression in two groups of biological samples. Each point in the scatter plot represents a gene; the axes display the significance versus fold-change estimated by the differential expression analysis.
# Initialize results
results['volcano_plot'] = {}
# Loop through signatures
for label, signature in signatures.items():
# Run analysis
results['volcano_plot'][label] = analyze(signature=signature, tool='volcano_plot', signature_label=label, pvalue_threshold=0.05, logfc_threshold=1.5, plot_type='interactive')
# Display results
plot(results['volcano_plot'][label])
** Figure 4 | Volcano Plot. **The figure contains an interactive scatter plot which displays the log2-fold changes and statistical significance of each gene calculated by performing a differential gene expression analysis. Every point in the plot represents a gene. Red points indicate significantly up-regulated genes, blue points indicate down-regulated genes. Additional information for each gene is available by hovering over it. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.
7. MA Plot¶
Volcano plots are a type of scatter plot commonly used to display the results of a differential gene expression analysis. They can be used to quickly identify genes whose expression is significantly altered in a perturbation, and to assess the global similarity of gene expression in two groups of biological samples. Each point in the scatter plot represents a gene; the axes display the average gene expression versus fold-change estimated by the differential expression analysis.
# Initialize results
results['ma_plot'] = {}
# Loop through signatures
for label, signature in signatures.items():
# Run analysis
results['ma_plot'][label] = analyze(signature=signature, tool='ma_plot', signature_label=label, pvalue_threshold=0.05, logfc_threshold=1, plot_type='interactive')
# Display results
plot(results['ma_plot'][label])
** Figure 5 | MA Plot. **The figure contains an interactive scatter plot which displays the average expression and statistical significance of each gene calculated by performing differential gene expression analysis. Every point in the plot represents a gene. Red points indicate significantly up-regulated genes, blue points indicate down-regulated genes. Additional information for each gene is available by hovering over it. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.
8. Enrichr Links¶
Enrichment analysis is a statistical procedure used to identify biological terms which are over-represented in a given gene set. These include signaling pathways, molecular functions, diseases, and a wide variety of other biological terms obtained by integrating prior knowledge of gene function from multiple resources. Enrichr is a web-based application which allows to perform enrichment analysis using a large collection of gene-set libraries and various interactive approaches to display enrichment results.
# Initialize results
results['enrichr'] = {}
# Loop through signatures
for label, signature in signatures.items():
# Run analysis
results['enrichr'][label] = analyze(signature=signature, tool='enrichr', signature_label=label, geneset_size=500, sort_genes_by='t')
# Display results
plot(results['enrichr'][label])
Control vs Perturbation Signature:¶
** Table 4 | Enrichr links. **The table displays links to Enrichr containing the results of enrichment analyses generated by analyzing the up-regulated and down-regulated genes from a differential expression analysis. By clicking on these links, users can interactively explore and download the enrichment results from the Enrichr website
9. Gene Ontology Enrichment Analysis¶
Gene Ontology (GO) is a major bioinformatics initiative aimed at unifying the representation of gene attributes across all species. It contains a large collection of experimentally validated and predicted associations between genes and biological terms. This information can be leveraged by Enrichr to identify the biological processes, molecular functions and cellular components which are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.
# Initialize results
results['go_enrichment'] = {}
# Loop through results
for label, enrichr_results in results['enrichr'].items():
# Run analysis
results['go_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='go_enrichment', signature_label=label, plot_type='interactive', go_version=2025, sort_results_by='pvalue')
# Display results
plot(results['go_enrichment'][label])
** Figure 6 | Gene Ontology Enrichment Analysis Results. **The figure contains interactive bar charts displaying the results of the Gene Ontology enrichment analysis generated using Enrichr. The x axis indicates the -log10(P-value) for each term. Significant terms are highlighted in bold. Additional information about enrichment results is available by hovering over each bar. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.
10. Pathway Enrichment Analysis¶
Biological pathways are sequences of interactions between biochemical compounds which play a key role in determining cellular behavior. Databases such as KEGG, Reactome and WikiPathways contain a large number of associations between such pathways and genes. This information can be leveraged by Enrichr to identify the biological pathways which are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.
# Initialize results
results['pathway_enrichment'] = {}
# Loop through results
for label, enrichr_results in results['enrichr'].items():
# Run analysis
results['pathway_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='pathway_enrichment', signature_label=label, plot_type='interactive', sort_results_by='pvalue')
# Display results
plot(results['pathway_enrichment'][label])
** Figure 7 | Pathway Enrichment Analysis Results.** The figure contains interactive bar charts displaying the results of the pathway enrichment analysis generated using Enrichr. The x axis indicates the -log10(P-value) for each term. Significant terms are highlighted in bold. Additional information about enrichment results is available by hovering over each bar. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.
11. Transcription Factor Enrichment Analysis¶
Transcription Factors (TFs) are proteins involved in the transcriptional regulation of gene expression. Databases such as ChEA and ENCODE contain a large number of associations between TFs and their transcriptional targets. This information can be leveraged by Enrichr to identify the transcription factors whose targets are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.
# Initialize results
results['tf_enrichment'] = {}
# Loop through results
for label, enrichr_results in results['enrichr'].items():
# Run analysis
results['tf_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='tf_enrichment', signature_label=label)
# Display results
plot(results['tf_enrichment'][label])
A. ChEA (experimentally validated targets)¶
| Rank | Transcription Factor | P-value | FDR | Target |
|---|---|---|---|---|
| 1 | AF4 26711339 ChIP-Seq SEM Human Blood Leukemia* | 5.476736e-44 | 4.107552e-41 | 200 upregulated targets |
| 2 | FOXO1 25302145 ChIP-Seq T-LYMPHOCYTE Mouse* | 1.757057e-30 | 6.588963e-28 | 123 upregulated targets |
| 3 | HNF1A 27111144 ChIP-Seq CD8+TCells Mouse Blood Lymphoma* | 5.512847e-30 | 1.378212e-27 | 180 upregulated targets |
| 4 | RUNX1 21571218 ChIP-Seq MEGAKARYOCYTES Human* | 5.790235e-27 | 1.085669e-24 | 209 upregulated targets |
| 5 | MYB 26560356 Chip-Seq TH2 Human* | 1.104989e-25 | 1.657484e-23 | 116 upregulated targets |
| 6 | UTX 26944678 Chip-Seq JUKART Human* | 9.332437e-25 | 1.166555e-22 | 114 upregulated targets |
| 7 | CREM 20920259 ChIP-Seq GC1-SPG Mouse* | 1.180260e-23 | 1.264564e-21 | 216 upregulated targets |
| 8 | SPI1 23547873 ChIP-Seq NB4 Human* | 4.473663e-22 | 4.194059e-20 | 150 upregulated targets |
| 9 | MECOM 23826213 ChIP-Seq KASUMI Mouse* | 5.815919e-22 | 4.846600e-20 | 105 upregulated targets |
| 10 | SMRT 22465074 ChIP-Seq MACROPHAGES Mouse* | 2.322321e-21 | 1.741741e-19 | 106 upregulated targets |
| 11 | TAL1 20566737 ChIP-Seq PRIMARY FETAL LIVER ERYTHROID Mouse* | 3.979905e-21 | 2.713571e-19 | 102 upregulated targets |
| 12 | KDM2B 26808549 Chip-Seq HPB-ALL Human* | 3.199896e-20 | 1.999935e-18 | 106 upregulated targets |
| 13 | MYB 26560356 Chip-Seq TH1 Human* | 9.906695e-20 | 5.715401e-18 | 104 upregulated targets |
| 14 | FLI1 21571218 ChIP-Seq MEGAKARYOCYTES Human* | 1.070867e-19 | 5.736788e-18 | 210 upregulated targets |
| 15 | ENL 26711339 ChIP-Seq SEM Human Blood Leukemia* | 1.894027e-19 | 9.470134e-18 | 100 upregulated targets |
| 16 | GATA3 27048872 Chip-Seq THYMUS Human* | 1.315219e-18 | 6.165089e-17 | 102 upregulated targets |
| 17 | NCOR 22465074 ChIP-Seq MACROPHAGES Mouse* | 2.290286e-18 | 1.010420e-16 | 101 upregulated targets |
| 18 | STAT3 20064451 ChIP-Seq CD4+T Mouse* | 9.908748e-18 | 4.128645e-16 | 68 upregulated targets |
| 19 | EKLF 21900194 ChIP-Seq ERYTHROCYTE Mouse* | 1.301390e-18 | 4.828156e-16 | 79 downregulated targets |
| 20 | NCOR1 26117541 ChIP-Seq K562 Human* | 6.399390e-17 | 2.526075e-15 | 99 upregulated targets |
| 21 | KDM2B 26808549 Chip-Seq SUP-B15 Human* | 9.090108e-17 | 3.408790e-15 | 100 upregulated targets |
| 22 | CREB1 20920259 ChIP-Seq GC1-SPG Mouse* | 1.182157e-16 | 4.221989e-15 | 128 upregulated targets |
| 23 | MAF 26560356 Chip-Seq TH1 Human* | 2.173344e-17 | 5.375403e-15 | 100 downregulated targets |
| 24 | TCF7 22412390 ChIP-Seq EML Mouse* | 7.534101e-17 | 1.118061e-14 | 99 downregulated targets |
| 25 | BRD4 27068464 Chip-Seq AML-cells Mouse* | 3.946094e-16 | 1.345259e-14 | 96 upregulated targets |
| 26 | KDM2B 26808549 Chip-Seq SIL-ALL Human* | 1.211862e-15 | 3.787068e-14 | 97 upregulated targets |
| 27 | ATF3 23680149 ChIP-Seq GBM1-GSC Human* | 1.211862e-15 | 3.787068e-14 | 97 upregulated targets |
| 28 | RUNX1 22412390 ChIP-Seq EML Mouse* | 2.199684e-15 | 6.599051e-14 | 93 upregulated targets |
| 29 | ELK3 25401928 ChIP-Seq HUVEC Human* | 6.413917e-16 | 7.931877e-14 | 99 downregulated targets |
| 30 | KDM2B 26808549 Chip-Seq DND41 Human* | 4.085270e-15 | 1.178443e-13 | 96 upregulated targets |
| 31 | CREB1 23762244 ChIP-Seq HIPPOCAMPUS Rat* | 7.871221e-15 | 2.186450e-13 | 108 upregulated targets |
| 32 | AF4 28076791 ChIP-Seq SEM Human Blood Leukemia* | 1.045512e-14 | 2.800478e-13 | 54 upregulated targets |
| 33 | GATA1 19941827 ChIP-Seq MEL86 Mouse* | 2.655031e-14 | 6.866459e-13 | 89 upregulated targets |
| 34 | E2A 27217539 Chip-Seq RAMOS-Cell Line Human* | 8.219155e-14 | 1.988505e-12 | 91 upregulated targets |
| 35 | RUNX2 24655370 ChIP-Seq MC3T3E1 Mouse Bone* | 9.546292e-14 | 2.237412e-12 | 181 upregulated targets |
| 36 | TAL1 20887958 ChIP-Seq HPC-7 Mouse* | 1.103101e-13 | 1.023126e-11 | 90 downregulated targets |
| 37 | STAT4 19710469 ChIP-ChIP TH1 Mouse* | 5.504823e-13 | 1.251096e-11 | 67 upregulated targets |
| 38 | MAF 26560356 Chip-Seq TH2 Human* | 1.313236e-12 | 2.814076e-11 | 89 upregulated targets |
| 39 | BRD4 28847988 ChIP-Seq BCBL1 Human Blood Lymphoma* | 2.025993e-12 | 4.220819e-11 | 27 upregulated targets |
| 40 | MYB 21317192 ChIP-Seq ERMYB Mouse* | 2.117441e-12 | 4.292111e-11 | 55 upregulated targets |
| 41 | VDR 24763502 ChIP-Seq THP-1 Human* | 2.243678e-12 | 4.428312e-11 | 66 upregulated targets |
| 42 | SPI1 22096565 ChIP-ChIP GC-B Mouse* | 6.883478e-13 | 4.643219e-11 | 67 downregulated targets |
| 43 | PPARG 20887899 ChIP-Seq 3T3-L1 Mouse* | 3.043126e-12 | 5.852165e-11 | 127 upregulated targets |
| 44 | GATA1 22383799 ChIP-Seq G1ME Mouse* | 4.186823e-12 | 7.850293e-11 | 87 upregulated targets |
| 45 | GATA2 19941826 ChIP-Seq K562 Human* | 2.470789e-12 | 1.527771e-10 | 88 downregulated targets |
| 46 | NUCKS1 24931609 ChIP-Seq HEPATOCYTES Mouse* | 1.025981e-11 | 1.832109e-10 | 45 upregulated targets |
| 47 | SPI1 22790984 ChIP-Seq ERYTHROLEUKEMIA Mouse* | 3.381096e-12 | 1.929825e-10 | 86 downregulated targets |
| 48 | YY1 26981420 ChIP-Seq C2C12 Mouse Muscle* | 1.313130e-11 | 2.290343e-10 | 93 upregulated targets |
| 49 | CEBPB 20176806 ChIP-Seq MACROPHAGES Mouse* | 2.382947e-11 | 4.026335e-10 | 81 upregulated targets |
| 50 | KDM2B 26808549 Chip-Seq JURKAT Human* | 2.415801e-11 | 4.026335e-10 | 87 upregulated targets |
B. ENCODE (experimentally validated targets)¶
| Rank | Transcription Factor | P-value | FDR | Target |
|---|---|---|---|---|
| 1 | POLR2AphosphoS5 G1E-ER4 mm9* | 2.716623e-24 | 2.214047e-21 | 126 downregulated targets |
| 2 | TAF7 K562 hg19* | 5.843769e-23 | 1.769709e-20 | 74 downregulated targets |
| 3 | NELFE K562 hg19* | 6.514265e-23 | 1.769709e-20 | 48 downregulated targets |
| 4 | KAT2A GM12878 hg19* | 4.406447e-22 | 8.978135e-20 | 118 downregulated targets |
| 5 | KAT2A HeLa-S3 hg19* | 9.615575e-22 | 1.567339e-19 | 87 downregulated targets |
| 6 | RELA GM18505 hg19* | 2.590055e-19 | 3.518159e-17 | 118 downregulated targets |
| 7 | RELA GM18526 hg19* | 2.790626e-17 | 3.249087e-15 | 67 downregulated targets |
| 8 | RELA GM12878 hg19* | 5.992928e-17 | 6.105295e-15 | 64 downregulated targets |
| 9 | CEBPD K562 hg19* | 1.352037e-16 | 1.224345e-14 | 59 downregulated targets |
| 10 | ETS1 MEL cell line mm9* | 2.604922e-17 | 2.125617e-14 | 110 upregulated targets |
| 11 | IKZF1 GM12878 hg19* | 1.293902e-16 | 5.279121e-14 | 112 upregulated targets |
| 12 | CEBPB GM12878 hg19* | 1.124576e-15 | 9.165296e-14 | 55 downregulated targets |
| 13 | ZMIZ1 MEL cell line mm9* | 1.302623e-15 | 9.651254e-14 | 87 downregulated targets |
| 14 | POLR2AphosphoS5 MEL cell line mm9* | 5.023599e-15 | 3.411861e-13 | 131 downregulated targets |
| 15 | NCOR1 K562 hg19* | 6.428200e-15 | 1.748470e-12 | 108 upregulated targets |
| 16 | CHD1 CH12.LX mm9* | 1.655810e-14 | 2.702283e-12 | 107 upregulated targets |
| 17 | RELA GM12891 hg19* | 1.655810e-14 | 2.702283e-12 | 107 upregulated targets |
| 18 | BCLAF1 K562 hg19* | 7.077089e-14 | 4.436791e-12 | 68 downregulated targets |
| 19 | TAL1 MEL cell line mm9* | 6.091471e-13 | 8.284401e-11 | 113 upregulated targets |
| 20 | TAF1 MCF-7 hg19* | 4.976099e-12 | 2.896801e-10 | 90 downregulated targets |
| 21 | RELA GM12892 hg19* | 3.417010e-12 | 3.485350e-10 | 76 upregulated targets |
| 22 | STAT3 HeLa-S3 hg19* | 2.255815e-11 | 1.149056e-09 | 98 downregulated targets |
| 23 | SPI1 GM12878 hg19* | 2.660357e-11 | 1.275406e-09 | 101 downregulated targets |
| 24 | GATA1 erythroblast mm9* | 2.068906e-11 | 1.875808e-09 | 98 upregulated targets |
| 25 | POLR2A liver mm9* | 1.073250e-10 | 4.373494e-09 | 97 downregulated targets |
| 26 | RELA GM19193 hg19* | 1.073250e-10 | 4.373494e-09 | 97 downregulated targets |
| 27 | CHD1 MEL cell line mm9* | 5.470044e-11 | 4.463556e-09 | 49 upregulated targets |
| 28 | ZNF274 K562 hg19* | 2.918737e-10 | 1.132748e-08 | 59 downregulated targets |
| 29 | SPI1 HL-60 hg19* | 3.748289e-10 | 1.388571e-08 | 69 downregulated targets |
| 30 | POLR2AphosphoS2 A549 hg19* | 5.325321e-10 | 1.887016e-08 | 95 downregulated targets |
| 31 | EP300 MEL cell line mm9* | 5.325321e-10 | 2.896975e-08 | 95 upregulated targets |
| 32 | UBTF MEL cell line mm9* | 5.325321e-10 | 2.896975e-08 | 95 upregulated targets |
| 33 | EP300 CH12.LX mm9* | 5.325321e-10 | 2.896975e-08 | 95 upregulated targets |
| 34 | STAT2 K562 hg19* | 1.150197e-09 | 3.791522e-08 | 27 downregulated targets |
| 35 | GTF2B K562 hg19* | 1.163044e-09 | 3.791522e-08 | 94 downregulated targets |
| 36 | TAL1 G1E-ER4 mm9* | 1.418953e-09 | 4.447871e-08 | 51 downregulated targets |
| 37 | CHD1 IMR-90 hg19* | 1.163044e-09 | 5.582609e-08 | 94 upregulated targets |
| 38 | BCL11A GM12878 hg19* | 2.025461e-09 | 5.917632e-08 | 65 downregulated targets |
| 39 | GATA1 erythroblast hg19* | 2.033051e-09 | 5.917632e-08 | 102 downregulated targets |
| 40 | GATA1 G1E-ER4 mm9* | 1.544295e-09 | 7.000803e-08 | 154 upregulated targets |
| 41 | NFATC1 GM12878 hg19* | 2.760804e-09 | 7.758810e-08 | 77 downregulated targets |
| 42 | RELA GM10847 hg19* | 1.118391e-08 | 2.848402e-07 | 91 downregulated targets |
| 43 | IRF1 K562 hg19* | 2.396682e-08 | 5.744987e-07 | 167 downregulated targets |
| 44 | TAL1 erythroblast mm9* | 2.660270e-08 | 6.194628e-07 | 51 downregulated targets |
| 45 | TCF3 myocyte mm9* | 1.701514e-08 | 6.311069e-07 | 90 upregulated targets |
| 46 | SPI1 GM12891 hg19* | 2.764268e-08 | 9.807144e-07 | 58 upregulated targets |
| 47 | POLR2A kidney mm9* | 4.726796e-08 | 1.041173e-06 | 89 downregulated targets |
| 48 | MEF2C GM12878 hg19* | 6.799267e-08 | 1.458264e-06 | 18 downregulated targets |
| 49 | TAF1 GM12878 hg19* | 7.684761e-08 | 1.605918e-06 | 65 downregulated targets |
| 50 | SP1 K562 hg19* | 8.168863e-08 | 1.664406e-06 | 63 downregulated targets |
C. ARCHS4 (coexpressed genes)¶
| Rank | Transcription Factor | P-value | FDR | Target |
|---|---|---|---|---|
| 1 | ZNF467 human tf ARCHS4 coexpression* | 2.269103e-84 | 3.439961e-81 | 99 downregulated targets |
| 2 | BCL6 human tf ARCHS4 coexpression* | 2.856892e-80 | 1.082762e-77 | 96 downregulated targets |
| 3 | RARA human tf ARCHS4 coexpression* | 2.856892e-80 | 1.082762e-77 | 96 downregulated targets |
| 4 | DHX34 human tf ARCHS4 coexpression* | 2.856892e-80 | 1.082762e-77 | 96 downregulated targets |
| 5 | SPI1 human tf ARCHS4 coexpression* | 6.414466e-79 | 1.620722e-76 | 95 downregulated targets |
| 6 | SNAI3 human tf ARCHS4 coexpression* | 6.414466e-79 | 1.620722e-76 | 95 downregulated targets |
| 7 | IRF9 human tf ARCHS4 coexpression* | 6.520025e-75 | 1.235545e-72 | 92 downregulated targets |
| 8 | TIGD3 human tf ARCHS4 coexpression* | 6.520025e-75 | 1.235545e-72 | 92 downregulated targets |
| 9 | TFEB human tf ARCHS4 coexpression* | 1.362245e-73 | 2.294626e-71 | 91 downregulated targets |
| 10 | LYL1 human tf ARCHS4 coexpression* | 1.113436e-69 | 1.406641e-67 | 88 downregulated targets |
| 11 | IRF2 human tf ARCHS4 coexpression* | 1.113436e-69 | 1.406641e-67 | 88 downregulated targets |
| 12 | RNF166 human tf ARCHS4 coexpression* | 1.113436e-69 | 1.406641e-67 | 88 downregulated targets |
| 13 | USF1 human tf ARCHS4 coexpression* | 2.161708e-68 | 2.184766e-66 | 87 downregulated targets |
| 14 | STAT5B human tf ARCHS4 coexpression* | 2.161708e-68 | 2.184766e-66 | 87 downregulated targets |
| 15 | ZNF319 human tf ARCHS4 coexpression* | 2.161708e-68 | 2.184766e-66 | 87 downregulated targets |
| 16 | NFE2 human tf ARCHS4 coexpression* | 4.119596e-67 | 3.469615e-65 | 86 downregulated targets |
| 17 | ZBTB7B human tf ARCHS4 coexpression* | 4.119596e-67 | 3.469615e-65 | 86 downregulated targets |
| 18 | STAT6 human tf ARCHS4 coexpression* | 4.119596e-67 | 3.469615e-65 | 86 downregulated targets |
| 19 | RXRA human tf ARCHS4 coexpression* | 1.414385e-64 | 1.072104e-62 | 84 downregulated targets |
| 20 | ZBP1 human tf ARCHS4 coexpression* | 1.414385e-64 | 1.072104e-62 | 84 downregulated targets |
| 21 | IRF1 human tf ARCHS4 coexpression* | 2.547587e-63 | 1.609226e-61 | 83 downregulated targets |
| 22 | IRF7 human tf ARCHS4 coexpression* | 2.547587e-63 | 1.609226e-61 | 83 downregulated targets |
| 23 | ZBTB48 human tf ARCHS4 coexpression* | 2.547587e-63 | 1.609226e-61 | 83 downregulated targets |
| 24 | SEMA4A human tf ARCHS4 coexpression* | 2.547587e-63 | 1.609226e-61 | 83 downregulated targets |
| 25 | ZNF524 human tf ARCHS4 coexpression* | 4.502238e-62 | 2.625151e-60 | 82 downregulated targets |
| 26 | TRAFD1 human tf ARCHS4 coexpression* | 4.502238e-62 | 2.625151e-60 | 82 downregulated targets |
| 27 | AKNA human tf ARCHS4 coexpression* | 1.327508e-59 | 7.453710e-58 | 80 downregulated targets |
| 28 | MXD1 human tf ARCHS4 coexpression* | 2.214316e-58 | 1.118968e-56 | 79 downregulated targets |
| 29 | ELF4 human tf ARCHS4 coexpression* | 2.214316e-58 | 1.118968e-56 | 79 downregulated targets |
| 30 | ZNF710 human tf ARCHS4 coexpression* | 2.214316e-58 | 1.118968e-56 | 79 downregulated targets |
| 31 | ZNF746 human tf ARCHS4 coexpression* | 3.622156e-57 | 1.771351e-55 | 78 downregulated targets |
| 32 | TSC22D4 human tf ARCHS4 coexpression* | 9.136189e-55 | 4.328270e-53 | 76 downregulated targets |
| 33 | NR1H2 human tf ARCHS4 coexpression* | 1.408363e-53 | 6.469933e-52 | 75 downregulated targets |
| 34 | KLF2 human tf ARCHS4 coexpression* | 3.150631e-51 | 1.404811e-49 | 73 downregulated targets |
| 35 | FOXO4 human tf ARCHS4 coexpression* | 4.570861e-50 | 1.924840e-48 | 72 downregulated targets |
| 36 | MXD3 human tf ARCHS4 coexpression* | 4.570861e-50 | 1.924840e-48 | 72 downregulated targets |
| 37 | MBNL1 human tf ARCHS4 coexpression* | 3.150631e-51 | 4.861424e-48 | 73 upregulated targets |
| 38 | FLI1 human tf ARCHS4 coexpression* | 6.496490e-49 | 2.661805e-47 | 71 downregulated targets |
| 39 | ASCL2 human tf ARCHS4 coexpression* | 9.044150e-48 | 3.427733e-46 | 70 downregulated targets |
| 40 | PARP12 human tf ARCHS4 coexpression* | 9.044150e-48 | 3.427733e-46 | 70 downregulated targets |
| 41 | SSH2 human tf ARCHS4 coexpression* | 9.044150e-48 | 3.427733e-46 | 70 downregulated targets |
| 42 | NFYC human tf ARCHS4 coexpression* | 1.233077e-46 | 4.559379e-45 | 69 downregulated targets |
| 43 | PLXNC1 human tf ARCHS4 coexpression* | 1.646150e-45 | 5.941819e-44 | 68 downregulated targets |
| 44 | BATF2 human tf ARCHS4 coexpression* | 2.151422e-44 | 7.247901e-43 | 67 downregulated targets |
| 45 | SP110 human tf ARCHS4 coexpression* | 2.151422e-44 | 7.247901e-43 | 67 downregulated targets |
| 46 | ZNF276 human tf ARCHS4 coexpression* | 2.151422e-44 | 7.247901e-43 | 67 downregulated targets |
| 47 | MKRN1 human tf ARCHS4 coexpression* | 2.151422e-44 | 1.659822e-41 | 67 upregulated targets |
| 48 | ATXN7 human tf ARCHS4 coexpression* | 2.752176e-43 | 1.415536e-40 | 66 upregulated targets |
| 49 | RELB human tf ARCHS4 coexpression* | 4.220046e-41 | 1.361189e-39 | 64 downregulated targets |
| 50 | SP2 human tf ARCHS4 coexpression* | 4.220046e-41 | 1.361189e-39 | 64 downregulated targets |
** Table 5 | Transcription Factor Enrichment Analysis Results. **The figure contains scrollable tables displaying the results of the Transcription Factor (TF) enrichment analysis generated using Enrichr. Every row represents a TF; significant TFs are highlighted in bold. A and B display results generated using ChEA and ENCODE libraries, indicating TFs whose experimentally validated targets are enriched. C displays results generated using the ARCHS4 library, indicating TFs whose top coexpressed genes (according to the ARCHS4 dataset) are enriched.
12. Kinase Enrichment Analysis¶
Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups. Databases such as KEA contain a large number of associations between kinases and their substrates. This information can be leveraged by Enrichr to identify the protein kinases whose substrates are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.
# Initialize results
results['kinase_enrichment'] = {}
# Loop through results
for label, enrichr_results in results['enrichr'].items():
# Run analysis
results['kinase_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='kinase_enrichment', signature_label=label)
# Display results
plot(results['kinase_enrichment'][label])
A. KEA (experimentally validated targets)¶
| Rank | Kinase | P-value | FDR | Substrate |
|---|---|---|---|---|
| 1 | AKT1* | 3.216180e-12 | 6.496684e-10 | 27 upregulated substrates |
| 2 | CDK2* | 5.971464e-10 | 6.031178e-08 | 41 upregulated substrates |
| 3 | IGF1R* | 2.716398e-08 | 1.829042e-06 | 14 upregulated substrates |
| 4 | MAPK1* | 2.204606e-07 | 9.068485e-06 | 26 upregulated substrates |
| 5 | MAPK14* | 2.709145e-07 | 9.068485e-06 | 29 upregulated substrates |
| 6 | PRKCA* | 2.715555e-07 | 9.068485e-06 | 31 upregulated substrates |
| 7 | MAP3K1* | 3.502413e-07 | 9.068485e-06 | 6 upregulated substrates |
| 8 | PRKCB* | 3.591479e-07 | 9.068485e-06 | 22 upregulated substrates |
| 9 | SRC* | 6.571651e-07 | 1.474971e-05 | 22 upregulated substrates |
| 10 | EGFR* | 1.629895e-06 | 3.292388e-05 | 11 upregulated substrates |
| 11 | MAPK3* | 3.469925e-06 | 6.130564e-05 | 18 upregulated substrates |
| 12 | RPS6KA3* | 3.641919e-06 | 6.130564e-05 | 24 upregulated substrates |
| 13 | SYK* | 4.487676e-06 | 6.973157e-05 | 8 upregulated substrates |
| 14 | CDK1* | 7.924293e-06 | 1.143362e-04 | 30 upregulated substrates |
| 15 | PAK1* | 2.259711e-05 | 3.043077e-04 | 8 upregulated substrates |
| 16 | LCK* | 2.774527e-05 | 3.502841e-04 | 11 upregulated substrates |
| 17 | LYN* | 3.720993e-05 | 4.421416e-04 | 10 upregulated substrates |
| 18 | PRKCD* | 5.866212e-05 | 6.583193e-04 | 11 upregulated substrates |
| 19 | GSK3B* | 6.762070e-05 | 6.877256e-04 | 29 upregulated substrates |
| 20 | CSNK2A2* | 6.809165e-05 | 6.877256e-04 | 17 upregulated substrates |
| 21 | MAPK9* | 1.121930e-04 | 1.079190e-03 | 12 upregulated substrates |
| 22 | PRKACA* | 1.556999e-04 | 1.429608e-03 | 23 upregulated substrates |
| 23 | ROCK1* | 2.076179e-04 | 1.823426e-03 | 6 upregulated substrates |
| 24 | SGK3* | 2.306062e-04 | 1.940936e-03 | 4 upregulated substrates |
| 25 | SGK1* | 2.860809e-04 | 2.311534e-03 | 8 upregulated substrates |
| 26 | MAPK10* | 3.010921e-04 | 2.339254e-03 | 10 upregulated substrates |
| 27 | CAMK2A* | 4.168418e-04 | 3.118594e-03 | 9 upregulated substrates |
| 28 | PRKACG* | 5.001862e-04 | 3.529580e-03 | 13 upregulated substrates |
| 29 | HCK* | 5.067219e-04 | 3.529580e-03 | 6 upregulated substrates |
| 30 | INSR* | 5.428695e-04 | 3.655321e-03 | 10 upregulated substrates |
| 31 | FYN* | 6.235543e-04 | 4.063160e-03 | 10 upregulated substrates |
| 32 | RPS6KA1* | 3.272161e-05 | 4.122922e-03 | 9 downregulated substrates |
| 33 | CDK14* | 6.586975e-04 | 4.158028e-03 | 6 upregulated substrates |
| 34 | ERBB2* | 7.089724e-04 | 4.339770e-03 | 4 upregulated substrates |
| 35 | PIM1* | 8.936428e-04 | 5.309290e-03 | 4 upregulated substrates |
| 36 | MAPKAPK2* | 1.066534e-03 | 6.155425e-03 | 6 upregulated substrates |
| 37 | MAP3K8* | 1.109747e-03 | 6.226913e-03 | 4 upregulated substrates |
| 38 | CSNK1A1* | 1.192183e-03 | 6.508672e-03 | 8 upregulated substrates |
| 39 | CSNK1E* | 1.424385e-03 | 7.571732e-03 | 10 upregulated substrates |
| 40 | JAK2* | 1.463060e-03 | 7.577899e-03 | 5 upregulated substrates |
| 41 | STK4* | 1.634778e-03 | 8.150391e-03 | 3 upregulated substrates |
| 42 | MAPK8* | 1.654287e-03 | 8.150391e-03 | 14 upregulated substrates |
| 43 | PRKDC* | 1.961467e-03 | 9.433723e-03 | 12 upregulated substrates |
| 44 | MAP3K5* | 2.206288e-03 | 1.036442e-02 | 3 upregulated substrates |
| 45 | CSNK2A1* | 2.382484e-03 | 1.093777e-02 | 16 upregulated substrates |
| 46 | ABL1* | 2.452468e-03 | 1.100886e-02 | 8 upregulated substrates |
| 47 | ROCK2* | 2.756202e-03 | 1.210332e-02 | 4 upregulated substrates |
| 48 | MTOR* | 3.350695e-03 | 1.440086e-02 | 7 upregulated substrates |
| 49 | KSR2* | 3.619519e-03 | 1.462286e-02 | 2 upregulated substrates |
| 50 | MAP3K2* | 3.619519e-03 | 1.462286e-02 | 2 upregulated substrates |
B. ARCHS4 (coexpressed genes)¶
| Rank | Kinase | P-value | FDR | Substrate |
|---|---|---|---|---|
| 1 | STK40 human kinase ARCHS4 coexpression* | 2.269103e-84 | 1.050595e-81 | 99 downregulated substrates |
| 2 | LIMK2 human kinase ARCHS4 coexpression* | 6.414466e-79 | 1.484949e-76 | 95 downregulated substrates |
| 3 | PRKD2 human kinase ARCHS4 coexpression* | 3.064599e-76 | 3.547273e-74 | 93 downregulated substrates |
| 4 | HCK human kinase ARCHS4 coexpression* | 3.064599e-76 | 3.547273e-74 | 93 downregulated substrates |
| 5 | PRKCD human kinase ARCHS4 coexpression* | 1.362245e-73 | 1.051199e-71 | 91 downregulated substrates |
| 6 | NUAK2 human kinase ARCHS4 coexpression* | 1.362245e-73 | 1.051199e-71 | 91 downregulated substrates |
| 7 | MAP3K3 human kinase ARCHS4 coexpression* | 2.794815e-72 | 1.617499e-70 | 90 downregulated substrates |
| 8 | FGR human kinase ARCHS4 coexpression* | 2.794815e-72 | 1.617499e-70 | 90 downregulated substrates |
| 9 | RPS6KA1 human kinase ARCHS4 coexpression* | 1.113436e-69 | 5.155210e-68 | 88 downregulated substrates |
| 10 | RIPK3 human kinase ARCHS4 coexpression* | 1.113436e-69 | 5.155210e-68 | 88 downregulated substrates |
| 11 | GRK6 human kinase ARCHS4 coexpression* | 2.161708e-68 | 8.340588e-67 | 87 downregulated substrates |
| 12 | MAP3K11 human kinase ARCHS4 coexpression* | 2.161708e-68 | 8.340588e-67 | 87 downregulated substrates |
| 13 | FES human kinase ARCHS4 coexpression* | 4.119596e-67 | 1.362409e-65 | 86 downregulated substrates |
| 14 | MLKL human kinase ARCHS4 coexpression* | 4.119596e-67 | 1.362409e-65 | 86 downregulated substrates |
| 15 | RAF1 human kinase ARCHS4 coexpression* | 1.414385e-64 | 4.092876e-63 | 84 downregulated substrates |
| 16 | LYN human kinase ARCHS4 coexpression* | 1.414385e-64 | 4.092876e-63 | 84 downregulated substrates |
| 17 | MAPK14 human kinase ARCHS4 coexpression* | 2.547587e-63 | 6.938428e-62 | 83 downregulated substrates |
| 18 | TYK2 human kinase ARCHS4 coexpression* | 1.327508e-59 | 3.414645e-58 | 80 downregulated substrates |
| 19 | GRK2 human kinase ARCHS4 coexpression* | 2.214316e-58 | 5.126142e-57 | 79 downregulated substrates |
| 20 | CSK human kinase ARCHS4 coexpression* | 2.214316e-58 | 5.126142e-57 | 79 downregulated substrates |
| 21 | MAP2K3 human kinase ARCHS4 coexpression* | 5.809813e-56 | 1.280926e-54 | 77 downregulated substrates |
| 22 | STK10 human kinase ARCHS4 coexpression* | 9.136189e-55 | 1.839155e-53 | 76 downregulated substrates |
| 23 | JAK3 human kinase ARCHS4 coexpression* | 9.136189e-55 | 1.839155e-53 | 76 downregulated substrates |
| 24 | STRADB human kinase ARCHS4 coexpression* | 1.408363e-53 | 2.716967e-52 | 75 downregulated substrates |
| 25 | MAPKAPK3 human kinase ARCHS4 coexpression* | 3.150631e-51 | 5.834969e-50 | 73 downregulated substrates |
| 26 | PTK2B human kinase ARCHS4 coexpression* | 4.570861e-50 | 8.139649e-49 | 72 downregulated substrates |
| 27 | MAP3K1 human kinase ARCHS4 coexpression* | 4.570861e-50 | 2.125450e-47 | 72 upregulated substrates |
| 28 | MKNK1 human kinase ARCHS4 coexpression* | 9.044150e-48 | 1.550904e-46 | 70 downregulated substrates |
| 29 | STK38 human kinase ARCHS4 coexpression* | 1.233077e-46 | 1.968672e-45 | 69 downregulated substrates |
| 30 | IRAK3 human kinase ARCHS4 coexpression* | 1.233077e-46 | 1.968672e-45 | 69 downregulated substrates |
| 31 | PIM1 human kinase ARCHS4 coexpression* | 1.646150e-45 | 2.458605e-44 | 68 downregulated substrates |
| 32 | ARAF human kinase ARCHS4 coexpression* | 1.646150e-45 | 2.458605e-44 | 68 downregulated substrates |
| 33 | IKBKE human kinase ARCHS4 coexpression* | 3.445372e-42 | 4.985023e-41 | 65 downregulated substrates |
| 34 | ATM human kinase ARCHS4 coexpression* | 2.752176e-43 | 6.398809e-41 | 66 upregulated substrates |
| 35 | JAK1 human kinase ARCHS4 coexpression* | 3.445372e-42 | 5.340327e-40 | 65 upregulated substrates |
| 36 | PAK2 human kinase ARCHS4 coexpression* | 4.220046e-41 | 4.905804e-39 | 64 upregulated substrates |
| 37 | MARK2 human kinase ARCHS4 coexpression* | 5.056246e-40 | 6.885418e-39 | 63 downregulated substrates |
| 38 | SYK human kinase ARCHS4 coexpression* | 5.056246e-40 | 6.885418e-39 | 63 downregulated substrates |
| 39 | SNRK human kinase ARCHS4 coexpression* | 5.056246e-40 | 4.702309e-38 | 63 upregulated substrates |
| 40 | STK17B human kinase ARCHS4 coexpression* | 5.924813e-39 | 3.935768e-37 | 62 upregulated substrates |
| 41 | MAP3K2 human kinase ARCHS4 coexpression* | 5.924813e-39 | 3.935768e-37 | 62 upregulated substrates |
| 42 | STRADA human kinase ARCHS4 coexpression* | 6.788262e-38 | 8.730459e-37 | 61 downregulated substrates |
| 43 | ERN1 human kinase ARCHS4 coexpression* | 6.788262e-38 | 8.730459e-37 | 61 downregulated substrates |
| 44 | STK4 human kinase ARCHS4 coexpression* | 6.788262e-38 | 3.945677e-36 | 61 upregulated substrates |
| 45 | MAP4K2 human kinase ARCHS4 coexpression* | 7.602891e-37 | 9.263522e-36 | 60 downregulated substrates |
| 46 | LTK human kinase ARCHS4 coexpression* | 7.602891e-37 | 9.263522e-36 | 60 downregulated substrates |
| 47 | RIOK3 human kinase ARCHS4 coexpression* | 8.322044e-36 | 9.879760e-35 | 59 downregulated substrates |
| 48 | PRKCH human kinase ARCHS4 coexpression* | 8.322044e-36 | 3.869751e-34 | 59 upregulated substrates |
| 49 | ROCK1 human kinase ARCHS4 coexpression* | 8.322044e-36 | 3.869751e-34 | 59 upregulated substrates |
| 50 | MAST3 human kinase ARCHS4 coexpression* | 8.900276e-35 | 9.811494e-34 | 58 downregulated substrates |
** Table 6 | Kinase Enrichment Analysis Results. **The figure contains browsable tables displaying the results of the Protein Kinase (PK) enrichment analysis generated using Enrichr. Every row represents a PK; significant PKs are highlighted in bold. A displays results generated using KEA, indicating PKs whose experimentally validated substrates are enriched. C displays results generated using the ARCHS4 library, indicating PKs whose top coexpressed genes (according to the ARCHS4 dataset) are enriched.
13. L1000CDS2 Query¶
L1000CDS2 is a web-based tool for querying gene expression signatures against signatures created from human cell lines treated with over 20,000 small molecules and drugs for the LINCS project. It is commonly used to identify small molecules which mimic or reverse the effects of a gene expression signature generated from a differential gene expression analysis.
# Initialize results
results['l1000cds2'] = {}
# Loop through signatures
for label, signature in signatures.items():
# Run analysis
results['l1000cds2'][label] = analyze(signature=signature, tool='l1000cds2', signature_label=label, plot_type='interactive')
# Display results
plot(results['l1000cds2'][label])
Control vs Perturbation signature:¶
L1000CDS2 Links:
Mimic Signature Query Results: https://maayanlab.cloud/L1000CDS2/#/result/6951a376da44360053834b37
Reverse Signature Query Results: https://maayanlab.cloud/L1000CDS2/#/result/6951a376da44360053834b39
** Figure 8 | L1000CDS2 Query results. **The figure contains an interactive bar chart displaying the top small molecules identified by the L1000CDS2 query. The left panel displays the small molecules which mimic the observed gene expression signature, while the right panel displays the small molecules which reverse it. Links to the L1000CDS2 web server are additionally provided, allowing users to interactively explore the analysis results. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.
14. L1000FWD Query¶
L1000FWD is a web-based tool for querying gene expression signatures against signatures created from human cell lines treated with over 20,000 small molecules and drugs for the LINCS project.
# Initialize results
results['l1000fwd'] = {}
# Loop through signatures
for label, signature in signatures.items():
# Run analysis
results['l1000fwd'][label] = analyze(signature=signature, tool='l1000fwd', signature_label=label)
# Display results
plot(results['l1000fwd'][label])
** Similar Signatures: **
| Signature ID | P-value | FDR | Z-score | Combined Score | |
|---|---|---|---|---|---|
| 1 | CPC006_RMUGS_6H:BRD-K00088062-001-02-1:40 | 0.000005 | 0.032997 | -1.668304 | 8.788569 |
| 2 | CPC017_HT29_6H:BRD-K09549677-300-01-8:10 | 0.000014 | 0.048747 | -1.801399 | 8.767109 |
| 3 | CPC006_A375_6H:BRD-K20755323-001-02-6:40 | 0.000020 | 0.056127 | -1.658470 | 7.805216 |
| 4 | CPC019_VCAP_6H:BRD-K94544211-001-01-2:10 | 0.000033 | 0.075523 | -1.834169 | 8.210300 |
| 5 | CPC006_VCAP_6H:BRD-A79768653-001-02-1:10 | 0.000094 | 0.138704 | -1.672292 | 6.734402 |
| 6 | CPC012_SKB_24H:BRD-K03644760-001-01-5:10 | 0.000170 | 0.191642 | -1.725651 | 6.504429 |
| 7 | CPC016_A375_6H:BRD-A17065207-001-06-9:10 | 0.000343 | 0.312032 | -1.814992 | 6.289378 |
| 8 | CPC012_HT29_6H:BRD-K82971429-001-01-9:10 | 0.000389 | 0.333313 | -1.785462 | 6.087913 |
| 9 | CPC006_A375_6H:BRD-A18763547-300-04-8:10 | 0.000495 | 0.378274 | -1.637077 | 5.411425 |
| 10 | CPC014_HCC515_6H:BRD-K72420232-001-01-6:10 | 0.000718 | 0.480366 | -1.745217 | 5.486585 |
| 11 | CPC019_A375_6H:BRD-K05197617-001-05-8:10 | 0.001039 | 0.529745 | -1.783683 | 5.321065 |
| 12 | HOG002_MCF7_24H:BRD-K20755323-001-02-6:10 | 0.001079 | 0.537760 | -1.801748 | 5.345398 |
| 13 | CPC006_PC3_24H:BRD-K23875128-001-04-2:10 | 0.001130 | 0.545670 | -1.694631 | 4.994029 |
| 14 | CPC003_PC3_6H:BRD-K37691127-001-02-2:10 | 0.001142 | 0.545670 | -1.656374 | 4.873610 |
| 15 | CPC006_HT29_6H:BRD-K41087962-001-01-7:0.63 | 0.001733 | 0.656401 | -1.626943 | 4.492304 |
| 16 | CPC011_A549_24H:BRD-A23359898-001-06-2:10 | 0.001857 | 0.667857 | -1.708746 | 4.667107 |
| 17 | CPC009_HT29_6H:BRD-K53732802-019-01-9:10 | 0.002394 | 0.770521 | -1.684286 | 4.414343 |
| 18 | CPC003_VCAP_6H:BRD-K36007650-300-02-3:10 | 0.002676 | 0.812540 | -1.628237 | 4.188592 |
| 19 | CPC006_HT29_24H:BRD-K76703230-001-01-3:0.31 | 0.002870 | 0.853255 | -1.684035 | 4.280975 |
| 20 | CPC016_NPC_24H:BRD-K81729199-001-01-0:10 | 0.002906 | 0.858021 | -1.729485 | 4.387131 |
| 21 | CPC019_PC3_6H:BRD-K60070073-001-02-3:10 | 0.003347 | 0.924272 | -1.825995 | 4.520074 |
| 22 | CPC005_A549_24H:BRD-A47513740-001-02-5:10 | 0.005587 | 1.000000 | -1.662500 | 3.745287 |
| 23 | CPC006_A375_6H:BRD-A43331270-001-01-6:10 | 0.005603 | 1.000000 | -1.677913 | 3.778014 |
| 24 | CPC011_HT29_6H:BRD-K37270826-001-20-1:10 | 0.005857 | 1.000000 | -1.747471 | 3.900950 |
| 25 | CPC014_VCAP_6H:BRD-K50168500-001-01-2:10 | 0.006720 | 1.000000 | -1.767657 | 3.840506 |
| 26 | CPC004_HA1E_24H:BRD-K70327191-001-01-4:10 | 0.006873 | 1.000000 | -1.609854 | 3.481865 |
| 27 | CPC007_PC3_24H:BRD-K49814456-001-09-2:10 | 0.007447 | 1.000000 | -1.691662 | 3.599838 |
| 28 | CPC009_VCAP_24H:BRD-A05565054-001-01-7:10 | 0.007447 | 1.000000 | -1.698480 | 3.614348 |
| 29 | CPC010_PC3_6H:BRD-K30697463-001-15-0:10 | 0.008787 | 1.000000 | -1.683134 | 3.460828 |
| 30 | CPC020_A375_6H:BRD-A88774919-001-02-8:10 | 0.009000 | 1.000000 | -1.774839 | 3.630912 |
| 31 | CPC006_PC3_24H:BRD-K61662457-001-02-2:20 | 0.009397 | 1.000000 | -1.667230 | 3.379500 |
| 32 | CPC019_PC3_6H:BRD-K84106030-001-01-1:10 | 0.009788 | 1.000000 | -1.775378 | 3.567239 |
| 33 | CPC001_HCC515_24H:BRD-K12906962-001-02-1:10 | 0.009880 | 1.000000 | -1.633461 | 3.275520 |
| 34 | CPC019_HT29_6H:BRD-K45253154-001-04-3:10 | 0.010128 | 1.000000 | -1.787531 | 3.565171 |
| 35 | CPC005_PC3_24H:BRD-K43405658-001-01-8:10 | 0.010272 | 1.000000 | -1.638462 | 3.257810 |
| 36 | CPC018_HEPG2_6H:BRD-K96799727-001-01-7:10 | 0.011449 | 1.000000 | -1.787635 | 3.470222 |
| 37 | CPC006_MCF7_6H:BRD-K89732114-001-05-9:10 | 0.011567 | 1.000000 | -1.641471 | 3.179177 |
| 38 | CPC016_SKB_24H:BRD-K38477985-001-01-8:10 | 0.015187 | 1.000000 | -1.731162 | 3.148143 |
| 39 | CPC012_HT29_6H:BRD-K98004941-001-01-0:10 | 0.015188 | 1.000000 | -1.730055 | 3.146126 |
| 40 | CPC002_HA1E_24H:BRD-K60640630-001-03-7:10 | 0.015555 | 1.000000 | -1.586161 | 2.867982 |
| 41 | CPC012_ASC_24H:BRD-K88556033-001-01-0:10 | 0.015895 | 1.000000 | -1.711952 | 3.079334 |
| 42 | LJP001_BT20_24H:BRD-K49810818-001-02-8:10 | 0.015902 | 1.000000 | -1.760592 | 3.166501 |
| 43 | CPC015_MCF7_24H:BRD-K42635745-001-02-4:10 | 0.016629 | 1.000000 | -1.767551 | 3.144702 |
| 44 | CPC008_HT29_6H:BRD-K86992982-001-04-4:10 | 0.017780 | 1.000000 | -1.661370 | 2.907526 |
| 45 | CPC014_VCAP_24H:BRD-K81528515-001-03-9:10 | 0.018591 | 1.000000 | -1.790000 | 3.097938 |
| 46 | CPC014_A549_6H:BRD-K33551950-001-01-6:10 | 0.019447 | 1.000000 | -1.781109 | 3.047753 |
| 47 | CPC011_VCAP_6H:BRD-K60770992-066-21-9:10 | 0.023998 | 1.000000 | -1.684888 | 2.729238 |
| 48 | CPC008_VCAP_6H:BRD-K41185612-001-01-3:10 | 0.029783 | 1.000000 | -1.679521 | 2.562994 |
| 49 | CPC015_HEPG2_6H:BRD-K37991163-003-06-8:10 | 0.031160 | 1.000000 | -1.701976 | 2.563856 |
| 50 | CPC004_VCAP_24H:BRD-A28746609-001-05-7:10 | 0.031772 | 1.000000 | -1.586507 | 2.376526 |
** Opposite Signatures: **
| Signature ID | P-value | FDR | Z-score | Combined Score | |
|---|---|---|---|---|---|
| 1 | CPC013_SKB_24H:BRD-K41925105-001-01-6:10 | 9.377789e-10 | 0.000028 | 1.856747 | -16.762527 |
| 2 | CPC014_HT29_6H:BRD-A80960055-001-01-7:10 | 6.296264e-07 | 0.005391 | 1.744277 | -10.816115 |
| 3 | CPC005_A375_6H:BRD-A18419789-001-01-4:10 | 6.851862e-06 | 0.033413 | 1.860728 | -9.609156 |
| 4 | PCLB002_HEPG2_24H:BRD-K02130563:0.37 | 7.024576e-06 | 0.033413 | 1.615133 | -8.323392 |
| 5 | CPC006_T3M10_6H:BRD-K06792661-001-01-9:10 | 1.242533e-05 | 0.048747 | 1.869785 | -9.172591 |
| 6 | CPC019_HT29_6H:BRD-K98426715-001-01-9:10 | 1.803479e-05 | 0.055147 | 1.771416 | -8.403399 |
| 7 | CPC014_HT29_6H:BRD-K85493820-001-01-6:10 | 2.421599e-05 | 0.060980 | 1.736553 | -8.015753 |
| 8 | CPC004_HT29_6H:BRD-A25687296-300-03-5:10 | 3.351935e-05 | 0.075523 | 1.820056 | -8.144213 |
| 9 | CPC014_HA1E_6H:BRD-A80960055-001-01-7:10 | 4.437048e-05 | 0.090450 | 1.723755 | -7.503342 |
| 10 | CPC002_PC3_6H:BRD-A80502530-001-01-2:10 | 4.997552e-05 | 0.093067 | 1.885537 | -8.110152 |
| 11 | CPC004_PC3_6H:BRD-K98490050-001-01-8:10 | 5.000233e-05 | 0.093067 | 1.821365 | -7.833711 |
| 12 | CPC009_A375_6H:BRD-K03857568-001-14-0:10 | 6.350874e-05 | 0.108924 | 1.837353 | -7.711676 |
| 13 | CPC014_HT29_6H:BRD-K80622725-001-10-2:10 | 6.615459e-05 | 0.108924 | 1.776753 | -7.425833 |
| 14 | CPC007_HT29_6H:BRD-K78843060-019-02-0:10 | 1.113100e-04 | 0.153712 | 1.890195 | -7.472822 |
| 15 | CPC004_PC3_24H:BRD-A35588707-001-03-0:10 | 1.342052e-04 | 0.179537 | 1.849252 | -7.160732 |
| 16 | CPC013_HA1E_6H:BRD-K80346834-001-01-5:10 | 1.393792e-04 | 0.180809 | 1.755700 | -6.769631 |
| 17 | CPC010_PC3_6H:BRD-K28916077-001-04-0:10 | 1.502656e-04 | 0.189198 | 1.757476 | -6.719076 |
| 18 | PCLB002_HEPG2_24H:BRD-K02130563:10 | 1.640911e-04 | 0.191642 | 1.626646 | -6.156718 |
| 19 | LJP001_SKBR3_6H:BRD-K99252563-001-01-1:10 | 2.330728e-04 | 0.232038 | 1.651500 | -5.999089 |
| 20 | CPC006_SW620_6H:BRD-K06792661-001-01-9:10 | 3.087887e-04 | 0.293754 | 1.787439 | -6.274516 |
| 21 | CPC014_A549_24H:BRD-A26002865-001-01-5:10 | 3.542609e-04 | 0.315949 | 1.715859 | -5.920876 |
| 22 | CPC002_PC3_24H:BRD-K08547377-003-03-2:10 | 3.804512e-04 | 0.332382 | 1.871616 | -6.400367 |
| 23 | CPC004_A375_6H:BRD-K98490050-001-01-8:10 | 4.077091e-04 | 0.335647 | 1.856805 | -6.293917 |
| 24 | CPC019_A375_6H:BRD-K98824517-001-06-4:10 | 4.853755e-04 | 0.377790 | 1.705292 | -5.651205 |
| 25 | CPC015_A549_24H:BRD-K92093830-003-05-0:10 | 5.112866e-04 | 0.383994 | 1.705879 | -5.614621 |
| 26 | CPC012_MCF7_24H:BRD-K74761218-001-03-5:10 | 5.714976e-04 | 0.414665 | 1.757586 | -5.699826 |
| 27 | CPC016_NPC_24H:BRD-A19037878:10 | 6.732211e-04 | 0.464837 | 1.712722 | -5.432482 |
| 28 | CPC016_MCF7_6H:BRD-K91370081-001-10-3:10 | 7.050422e-04 | 0.479082 | 1.729834 | -5.452064 |
| 29 | CPC010_PC3_24H:BRD-A24643465-001-05-3:10 | 7.415868e-04 | 0.487843 | 1.754433 | -5.491091 |
| 30 | CPC011_VCAP_24H:BRD-A11990600-001-02-6:10 | 7.977059e-04 | 0.487843 | 1.733547 | -5.370800 |
| 31 | CPC006_U937_6H:BRD-K78126613-001-16-0:10 | 8.967437e-04 | 0.518766 | 1.775656 | -5.411014 |
| 32 | CPC010_A375_6H:BRD-K93034159-001-20-9:10 | 9.450331e-04 | 0.529745 | 1.763403 | -5.333505 |
| 33 | CPC014_MCF7_6H:BRD-A26002865-001-01-5:10 | 9.544044e-04 | 0.529745 | 1.726930 | -5.215792 |
| 34 | CPC016_A375_6H:BRD-K08547377-003-03-2:10 | 1.025448e-03 | 0.529745 | 1.737293 | -5.192920 |
| 35 | CPC006_TYKNU_6H:BRD-A36630025-001-02-6:0.35 | 1.332391e-03 | 0.582024 | 1.779220 | -5.115912 |
| 36 | CPC017_A549_6H:BRD-K06426971-001-01-9:10 | 1.543943e-03 | 0.650376 | 1.662098 | -4.672770 |
| 37 | CPC007_HT29_6H:BRD-K03067624-003-19-3:10 | 1.549637e-03 | 0.650376 | 1.796025 | -5.046418 |
| 38 | CPC002_PC3_6H:BRD-A69960130-066-01-4:10 | 1.596569e-03 | 0.651934 | 1.838885 | -5.143015 |
| 39 | CPC002_PC3_24H:BRD-K34608650-001-01-6:10 | 1.644721e-03 | 0.651934 | 1.820951 | -5.069359 |
| 40 | CPC006_A375_6H:BRD-K86682249-001-05-7:10 | 1.694118e-03 | 0.654016 | 1.812360 | -5.022152 |
| 41 | CPC006_SW620_6H:BRD-K04853698-003-01-4:10 | 1.694118e-03 | 0.654016 | 1.761222 | -4.880446 |
| 42 | CVD001_HEPG2_6H:BRD-K06426971-001-01-9:10 | 1.744786e-03 | 0.656401 | 1.672288 | -4.612600 |
| 43 | CPC006_HA1E_24H:BRD-A02481876-001-09-9:60 | 1.834833e-03 | 0.667857 | 1.772045 | -4.849031 |
| 44 | CPC006_RMUGS_6H:BRD-K15409150-001-02-5:30 | 1.941735e-03 | 0.686973 | 1.737236 | -4.711054 |
| 45 | CPC014_A375_6H:BRD-K33551950-001-01-6:10 | 2.827782e-03 | 0.852497 | 1.706767 | -4.349787 |
| 46 | LJP001_SKBR3_6H:BRD-K64890080-001-09-6:10 | 2.870162e-03 | 0.853255 | 1.612652 | -4.099511 |
| 47 | CPC020_PC3_6H:BRD-K58306044-001-01-3:10 | 3.035519e-03 | 0.878024 | 1.660358 | -4.180394 |
| 48 | CPC006_PC3_6H:BRD-K43620258-001-01-6:80 | 3.298653e-03 | 0.924272 | 1.799578 | -4.465947 |
| 49 | CPC014_A549_6H:BRD-K81142122-001-15-8:10 | 3.484586e-03 | 0.956228 | 1.685972 | -4.143864 |
| 50 | CPC010_MCF7_6H:BRD-A24643465-001-05-3:10 | 9.083642e-03 | 1.000000 | 1.694358 | -3.459439 |
Methods¶
Data¶
Data Source¶
Dataset was user-submitted, compressed in an HDF5 data package, and uploaded to Google Cloud.
Data Normalization¶
Quantile Normalization¶
Raw counts were normalized using quantile normalization from the DESeq2 R package.
Signature Generation¶
The gene expression signature was generated by comparing gene expression levels between the control group and the experimental group using the limma R package (Ritchie et al., Nucleic Acids Research 2015), available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/limma.html.
PCA¶
Principal Component Analysis was performed using the PCA function from in the sklearn Python module. Prior to performing PCA, the raw gene counts were normalized using the quantile method, filtered by selecting the 500 genes with most variable expression, and finally transformed using the Z-score method.
Clustergrammer¶
The interactive heatmap was generated using Clustergrammer (Fernandez et al., 2017) which is freely available at http://amp.pharm.mssm.edu/clustergrammer/. Prior to displaying the heatmap, the raw gene counts were normalized using the quantile method, filtered by selecting the 500 genes with most variable expression, and finally transformed using the Z-score method.
Library Size Analysis¶
Read counts were calculated by performing the sum for each column in the raw gene count matrix. Total counts were subsequently divided by 106 and displayed as million reads.
Differential Expression Table¶
The gene expression signature was generated by performing differential gene expression analysis using the methods described in the Differential Gene Expression section.
Volcano Plot¶
Gene fold changes were transformed using log2 and displayed on the x axis; P-values were corrected using the Benjamini-Hochberg method, transformed using –log10, and displayed on the y axis. See the Differential Gene Expression section for more information on the methods used to generate these values.
MA Plot¶
Average gene expression was identified by calculating the mean of the normalized gene expression values and displayed on the x axis; P-values were corrected using the Benjamini-Hochberg method, transformed using –log10, and displayed on the y axis. For more information on the methods used to generate the signature, see the Differential Gene Expression section.
Enrichr Links¶
The up-regulated and down-regulated gene sets were generated by extracting the 500 genes with the respectively highest and lowest values from the gene expression signature. The gene sets were subsequently submitted to Enrichr (Kuleshov et al., 2016), which is freely available at http://amp.pharm.mssm.edu/Enrichr/, using the gene set upload API. For more information on the methods used to generate the signature, see the Differential Gene Expression section.
Gene Ontology Enrichment Analysis¶
Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: GO_Biological_Process_2018, GO_Molecular_Function_2018, GO_Cellular_Component_2018. Significant terms are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.
Pathway Enrichment Analysis¶
Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: KEGG_2016, Reactome_2016, WikiPathways_2016. Significant terms are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.
Transcription Factor Enrichment Analysis¶
Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: ChEA_2016, ENCODE_TF_ChIP-seq_2015, ARCHS4_TFs_Coexp. Significant results are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.
Kinase Enrichment Analysis¶
Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: KEA_2015, ARCHS4_Kinases_Coexp. Significant results are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.
L1000CDS2 Query¶
The L1000CDS2 analysis (Duan et al., 2016) was performed by submitting the top 2000 genes in the gene expression signature to the L1000CDS2 signature search API. For more information on the methods used to generate the signature, see the Differential Gene Expression section.
L1000FWD Query¶
The L1000FWD analysis (Wang et al., 2018) was performed by submitting the top 2000 genes in the gene expression signature to the L1000FWD signature search API. For more information on the methods used to generate the signature, see the Differential Gene Expression section.
References¶
Duan, Q., Reid, S.P., Clark, N.R., Wang, Z., Fernandez, N.F., Rouillard, A.D., Readhead, B., Tritsch, S.R., Hodos, R., Hafner, M., et al. (2016). L1000CDS2: LINCS L1000 characteristic direction signatures search engine. Npj Systems Biology and Applications 2. doi: https://doi.org/10.1038/npjsba.2016.15
Fernandez, N.F., Gundersen, G.W., Rahman, A., Grimes, M.L., Rikova, K., Hornbeck, P., and Ma'ayan, A. (2017). Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data 4, 170151. doi: http://dx.doi.org/10.1038/sdata.2017.151
Kuleshov, M.V., Jones, M.R., Rouillard, A.D., Fernandez, N.F., Duan, Q., Wang, Z., Koplev, S., Jenkins, S.L., Jagodnik, K.M., Lachmann, A., et al. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 44, W90ÐW97. doi: https://dx.doi.org/10.1093/nar/gkw377
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15. doi: http://doi.org/10.1186/s13059-014-0550-8
Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2, 559Ð572. doi: https://doi.org/10.1080/14786440109462720
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47–e47. doi: https://doi.org/10.1093/nar/gkv007
Wang, Z., Lachmann, A., Keenan, A.B., and Ma’ayan, A. (2018). L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics. doi: https://doi.org/10.1093/bioinformatics/bty060
and is an open source project available on GitHub.